Probability Based Clustering for Document and User Properties
نویسندگان
چکیده
Information Retrieval systems can be improved by exploiting context information such as user and document features. This article presents a model based on overlapping probabilistic or fuzzy clusters for such features. The model is applied within a fusion method which linearly combines several retrieval systems. The fusion is based on weights for the different retrieval systems which are learned by exploiting relevance feedback information. This calculation can be improved by maintaining a model for each document and user cluster. That way, the optimal retrieval system for each document or user type can be identified and applied. The extension presented in this article allows overlapping, probabilistic clusters of features to further refine the process.
منابع مشابه
Evolutionary User Clustering Based on Time-Aware Interest Changes in the Recommender System
The plenty of data on the Internet has created problems for users and has caused confusion in finding the proper information. Also, users' tastes and preferences change over time. Recommender systems can help users find useful information. Due to changing interests, systems must be able to evolve. In order to solve this problem, users are clustered that determine the most desirable users, it pa...
متن کاملRRLUFF: Ranking function based on Reinforcement Learning using User Feedback and Web Document Features
Principal aim of a search engine is to provide the sorted results according to user’s requirements. To achieve this aim, it employs ranking methods to rank the web documents based on their significance and relevance to user query. The novelty of this paper is to provide user feedback-based ranking algorithm using reinforcement learning. The proposed algorithm is called RRLUFF, in which the rank...
متن کاملA Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملHierarchical Fuzzy Clustering Semantics (HFCS) in Web Document for Discovering Latent Semantics
This paper discusses about the future of the World Wide Web development, called Semantic Web. Undoubtedly, Web service is one of the most important services on the Internet, which has had the greatest impact on the generalization of the Internet in human societies. Internet penetration has been an effective factor in growth of the volume of information on the Web. The massive growth of informat...
متن کاملخوشهبندی اسناد مبتنی بر آنتولوژی و رویکرد فازی
Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1102.3865 شماره
صفحات -
تاریخ انتشار 2001